{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3IrnqNnWO04A"
   },
   "source": [
    "# Ungraded Lab: Feature Engineering with Accelerometer Data\n",
    "\n",
    "This notebook demonstrates how to prepare time series data taken from an accelerometer. We will be using the [WISDM Human Activity Recognition Dataset](http://www.cis.fordham.edu/wisdm/dataset.php) for this example. This dataset can be used to predict the activity a user performs from a set of acceleration values recorded from the accelerometer of a smartphone.\n",
    "\n",
    "The dataset consists of accelerometer data in the x, y, and z-axis recorded for 36 user different users. A total of 6 activities: 'Walking','Jogging', 'Upstairs', 'Downstairs', 'Sitting', and 'Standing' were recorded. The sensors have a sampling rate of 20Hz which means there are 20 observations recorded per second."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QDhCGkJoQDEN"
   },
   "source": [
    "## Install Packages\n",
    "\n",
    "As with the previous lab, you will install the `tensorflow_transform` Python package and its dependencies.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qONoNgHNWKt3"
   },
   "outputs": [],
   "source": [
    "!pip install tensorflow_transform==1.12.0\n",
    "!pip install apache-beam==2.40.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xvoAuLGMQXhL"
   },
   "source": [
    "*Note: In Google Colab, you need to restart the runtime at this point to finalize updating the packages you just installed. You can do so by clicking the `Restart Runtime` at the end of the output cell above (after installation), or by selecting `Runtime > Restart Runtime` in the Menu bar. **Please do not proceed to the next section without restarting.** You can also ignore the errors about version incompatibility of some of the bundled packages because we won't be using those in this notebook.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "iyHZtPotQhL0"
   },
   "source": [
    "## Imports\n",
    "\n",
    "Running the imports below should not show any error. Otherwise, please restart your runtime or re-run the package installation cell above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "b5isfzOcWbCl"
   },
   "outputs": [],
   "source": [
    "import apache_beam as beam\n",
    "print('Apache Beam version: {}'.format(beam.__version__))\n",
    "\n",
    "import tensorflow as tf\n",
    "print('Tensorflow version: {}'.format(tf.__version__))\n",
    "\n",
    "import tensorflow_transform as tft\n",
    "from tensorflow_transform import beam as tft_beam\n",
    "from tensorflow_transform.tf_metadata import dataset_metadata\n",
    "from tensorflow_transform.tf_metadata import schema_utils\n",
    "print('TensorFlow Transform version: {}'.format(tft.__version__))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ll_BqMx4QyP5"
   },
   "source": [
    "## Download the Data\n",
    "\n",
    "Next, you will download the data and put it in your workspace."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Lu4uJ4RgEfE8"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# Directory of the raw data files\n",
    "DATA_DIR = '/content/data/'\n",
    "\n",
    "# Download the dataset\n",
    "!wget -nc https://github.com/https-deeplearning-ai/machine-learning-engineering-for-production-public/raw/main/course2/week4-ungraded-lab/data/WISDM_ar_latest.tar.gz -P {DATA_DIR}\n",
    "\n",
    "# Extract the dataset\n",
    "!tar -xvf {DATA_DIR}/WISDM_ar_latest.tar.gz -C {DATA_DIR}\n",
    "\n",
    "# Assign data path to a variable for easy reference\n",
    "INPUT_FILE = os.path.join(DATA_DIR, 'WISDM_ar_v1.1/WISDM_ar_v1.1_raw.txt')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "oKDJaZfEQ1yT"
   },
   "source": [
    "## Inspect the Data\n",
    "\n",
    "You should now inspect the dataset and you can start by previewing it as a dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MipCmynFWD47"
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "# Put dataset in a dataframe\n",
    "df = pd.read_csv(INPUT_FILE, header=None, names=['user_id', 'activity', 'timestamp', 'x-acc','y-acc', 'z-acc'], on_bad_lines='skip')\n",
    "\n",
    "# Preview the first few rows\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "xuHH4Ffdz0wE"
   },
   "source": [
    "You might notice the semicolon at the end of the `z-acc` elements. This might cause the elements to be treated as a string so you may want to remove it when analyzing your data. You will do this later in the pipeline with [Beam.map()](https://beam.apache.org/documentation/transforms/python/elementwise/map/). This is also taken care of by the `visualize_plots()` function below which you will use in the next section."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "YK9tAyLVCgTe"
   },
   "outputs": [],
   "source": [
    "# Visulaization Utilities\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def visualize_value_plots_for_categorical_feature(feature, colors=['b']):\n",
    "    '''Plots a bar graph for categorical features'''\n",
    "    counts = feature.value_counts()\n",
    "    plt.bar(counts.index, counts.values, color=colors)\n",
    "    plt.show()\n",
    "\n",
    "\n",
    "def visualize_plots(dataset, activity, columns):\n",
    "    '''Visualizes the accelerometer data against time'''\n",
    "    features = dataset[dataset['activity'] == activity][columns][:200]\n",
    "    if 'z-acc' in columns:\n",
    "        # remove semicolons in the z-acc column\n",
    "        features['z-acc'] = features['z-acc'].replace(regex=True, to_replace=r';', value=r'')\n",
    "        features['z-acc'] = features['z-acc'].astype(np.float64)\n",
    "    axis = features.plot(subplots=True, figsize=(16, 12), \n",
    "                     title=activity)\n",
    "\n",
    "    for ax in axis:\n",
    "        ax.legend(loc='lower left', bbox_to_anchor=(1.0, 0.5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "vkvdKe6NP3EL"
   },
   "source": [
    "### Histogram of Activities\n",
    "\n",
    "You can now proceed with the visualizations. You can plot the histogram of activities and make your observations. For instance, you'll notice that there is more data for walking and jogging than other activities. This might have an effect on how your model learns each activity so you should take note of it. For example, you may want to collect more data for the other activities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "dva7JfmGDGQT"
   },
   "outputs": [],
   "source": [
    "# Plot the histogram of activities\n",
    "visualize_value_plots_for_categorical_feature(df['activity'], colors=['r', 'g', 'b', 'y', 'm', 'c'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Oj1bf2WZQHN8"
   },
   "source": [
    "### Histogram of Measurements per User\n",
    "You can also observe the number of measurements taken per user."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "8jJhG5DMFj-d"
   },
   "outputs": [],
   "source": [
    "# Plot the histogram for users\n",
    "visualize_value_plots_for_categorical_feature(df['user_id'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "f6CeWHKg3JC3"
   },
   "source": [
    "You can consult with field experts on which of the users you should be part of the training or test set. For this lab, you will just do a simple partition. You will put user ids 1 to 30 in the train set, and the rest in the test set. Here is the `partition_fn` you will use for `Beam.Partition()` later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "n800M3nz3DFh"
   },
   "outputs": [],
   "source": [
    "def partition_fn(line, num_partitions):\n",
    "  '''\n",
    "  Partition function to work with Beam.partition\n",
    "\n",
    "  Args:\n",
    "    line (string) - One record in the CSV file.\n",
    "    num_partition (integer) - Number of partitions. Required argument by Beam. Unused in this function.\n",
    "\n",
    "  Returns:\n",
    "    0 or 1 (integer) - 0 if user id is less than 30, 1 otherwise. \n",
    "  '''\n",
    "  \n",
    "  # Get the 1st substring delimited by a comma. Cast to an int.\n",
    "  user_id = int(line[:line.index(b',')])\n",
    "\n",
    "  # Check if it is above or below 30\n",
    "  partition_num = int(user_id <= 30)\n",
    "\n",
    "  return partition_num"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "gY9wwg10QrXo"
   },
   "source": [
    "### Acceleration per Activity\n",
    "\n",
    "Finally, you can plot the sensor measurements against the timestamps. You can observe that acceleration is more for activities like jogging when compared to sitting which should be the expected behaviour. If this is not the case, then there might be problems with the sensor and can make the data invalid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "sLa8UAe3JDZ3"
   },
   "outputs": [],
   "source": [
    "# Plot the measurements for `Jogging`\n",
    "visualize_plots(df, 'Jogging', columns=['x-acc', 'y-acc', 'z-acc'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Z1CvWGpEQwKO"
   },
   "outputs": [],
   "source": [
    "visualize_plots(df, 'Sitting', columns=['x-acc', 'y-acc', 'z-acc'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EjZDkX-L4s-B"
   },
   "source": [
    "## Declare Schema for Cleaned Data\n",
    "\n",
    "As usual, you will want to declare a schema to make sure that your data input is parsed correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "QeUDSzCr4y5F"
   },
   "outputs": [],
   "source": [
    "STRING_FEATURES = ['activity']\n",
    "INT_FEATURES = ['user_id', 'timestamp']\n",
    "FLOAT_FEATURES = ['x-acc', 'y-acc', 'z-acc']\n",
    "\n",
    "# Declare feature spec\n",
    "RAW_DATA_FEATURE_SPEC = dict(\n",
    "    [(name, tf.io.FixedLenFeature([], tf.string))\n",
    "     for name in STRING_FEATURES] +\n",
    "    [(name, tf.io.FixedLenFeature([], tf.int64))\n",
    "     for name in INT_FEATURES] +\n",
    "    [(name, tf.io.FixedLenFeature([], tf.float32))\n",
    "     for name in FLOAT_FEATURES]\n",
    ")\n",
    "\n",
    "# Create schema from feature spec\n",
    "RAW_DATA_SCHEMA = tft.tf_metadata.schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KXlu3wOBuWor"
   },
   "source": [
    "## Create a `tf.Transform` preprocessing_fn\n",
    "\n",
    "You can then define your preprocessing function. For this exercise, you will scale the float features by their min-max values and create a vocabulary lookup for the string label. You will also discard features that you will not need in the model such as the user id and timestamp."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "neDQbbY30Sdu"
   },
   "outputs": [],
   "source": [
    "LABEL_KEY = 'activity'\n",
    "\n",
    "def preprocessing_fn(inputs):\n",
    "  \"\"\"Preprocess input columns into transformed columns.\"\"\"\n",
    "\n",
    "  # Copy inputs\n",
    "  outputs = inputs.copy()\n",
    "\n",
    "  # Delete features not to be included as inputs to the model\n",
    "  del outputs[\"user_id\"]\n",
    "  del outputs[\"timestamp\"]\n",
    "  \n",
    "  # Create a vocabulary for the string labels\n",
    "  outputs[LABEL_KEY] = tft.compute_and_apply_vocabulary(inputs[LABEL_KEY],vocab_filename=LABEL_KEY)\n",
    "\n",
    "  # Scale features by their min-max\n",
    "  for key in FLOAT_FEATURES:\n",
    "     outputs[key] = tft.scale_by_min_max(outputs[key])\n",
    "\n",
    "  return outputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3SSRMC37SkyI"
   },
   "source": [
    "## Transform the data\n",
    "\n",
    "Now you're ready to start transforming the data in an Apache Beam pipeline using Tensorflow Transform. It will follow these major steps:\n",
    "\n",
    "1. Read in the data using `beam.io.ReadFromText`\n",
    "1. Clean it using `beam.Map` and `beam.Filter`\n",
    "1. Transform it using a preprocessing pipeline that scales numeric data and converts categorical data from strings to int64 values indices. \n",
    "1. Write out the result as a `TFRecord` of `Example` protos, which can be used for training a model later."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "xJ_eE8nntSSs"
   },
   "outputs": [],
   "source": [
    "import shutil\n",
    "from tfx_bsl.coders.example_coder import RecordBatchToExamplesEncoder\n",
    "from tfx_bsl.public import tfxio\n",
    "\n",
    "# Directory names for the TF Transform outputs\n",
    "WORKING_DIR = 'transform_dir'\n",
    "TRANSFORM_TRAIN_FILENAME = 'transform_train'\n",
    "TRANSFORM_TEST_FILENAME = 'transform_test'\n",
    "TRANSFORM_TEMP_DIR = 'tft_temp'\n",
    "\n",
    "ordered_columns = ['user_id', 'activity', 'timestamp', 'x-acc','y-acc', 'z-acc']\n",
    "\n",
    "def transform_data(working_dir):\n",
    "    '''\n",
    "    Reads a CSV File and preprocesses the data using TF Transform\n",
    "\n",
    "    Args:\n",
    "      working_dir (string) - directory to place TF Transform outputs\n",
    "    \n",
    "    Returns:\n",
    "      transform_fn - transformation graph\n",
    "      transformed_train_data - transformed training examples\n",
    "      transformed_test_data - transformed test examples\n",
    "      transformed_metadata - transform output metadata\n",
    "    '''\n",
    "\n",
    "    # Delete TF Transform if it already exists\n",
    "    if os.path.exists(working_dir):\n",
    "      shutil.rmtree(working_dir)\n",
    "\n",
    "    with beam.Pipeline() as pipeline:\n",
    "        with tft_beam.Context(temp_dir=os.path.join(working_dir, TRANSFORM_TEMP_DIR)):\n",
    "  \n",
    "          # Read the input CSV, clean and filter the data (replace semicolon and incomplete rows)\n",
    "          raw_data = (\n",
    "              pipeline\n",
    "              | 'ReadTrainData' >> beam.io.ReadFromText(INPUT_FILE, coder=beam.coders.BytesCoder())\n",
    "              | 'CleanLines' >> beam.Map(lambda line: line.replace(b',;', b'').replace(b';', b''))\n",
    "              | 'FilterLines' >> beam.Filter(lambda line: line.count(b',') == len(ordered_columns) - 1 and line[-1:] != b','))\n",
    "\n",
    "          # Partition the data into training and test data using beam.Partition\n",
    "          raw_train_data, raw_test_data = (raw_data\n",
    "                                  | 'TrainTestSplit' >> beam.Partition(partition_fn, 2))\n",
    "                    \n",
    "          # Create a TFXIO to read the data with the schema. \n",
    "          csv_tfxio = tfxio.BeamRecordCsvTFXIO(\n",
    "              physical_format='text',\n",
    "              column_names=ordered_columns,\n",
    "              schema=RAW_DATA_SCHEMA)\n",
    "\n",
    "          # Parse the raw train data into inputs for TF Transform\n",
    "          raw_train_data = (raw_train_data \n",
    "                            | 'DecodeTrainData' >> csv_tfxio.BeamSource())\n",
    "\n",
    "          # Get the raw data metadata\n",
    "          RAW_DATA_METADATA = csv_tfxio.TensorAdapterConfig()\n",
    "          \n",
    "          # Pair the test data with the metadata into a tuple\n",
    "          raw_train_dataset = (raw_train_data, RAW_DATA_METADATA)\n",
    "\n",
    "          # Training data transformation. The TFXIO (RecordBatch) output format\n",
    "          # is chosen for improved performance.\n",
    "          (transformed_train_data,transformed_metadata) , transform_fn = (\n",
    "              raw_train_dataset \n",
    "                | 'AnalyzeAndTransformTrainData' >> tft_beam.AnalyzeAndTransformDataset(preprocessing_fn, output_record_batches=True))\n",
    "          \n",
    "          # Parse the raw test data into inputs for TF Transform\n",
    "          raw_test_data = (raw_test_data \n",
    "                            |'DecodeTestData' >> csv_tfxio.BeamSource())\n",
    "\n",
    "          # Pair the test data with the metadata into a tuple\n",
    "          raw_test_dataset = (raw_test_data, RAW_DATA_METADATA)\n",
    "\n",
    "          # Now apply the same transform function to the test data.\n",
    "          # You don't need the transformed data schema. It's the same as before.\n",
    "          transformed_test_data, _ = ((raw_test_dataset, transform_fn) \n",
    "                                        | 'AnalyzeAndTransformTestData' >> tft_beam.TransformDataset(output_record_batches=True))\n",
    "          \n",
    "          # Declare an encoder to convert output record batches to TF Examples \n",
    "          transformed_data_coder = RecordBatchToExamplesEncoder(transformed_metadata.schema)\n",
    "\n",
    "          \n",
    "          # Encode transformed train data and write to disk\n",
    "          _ = (\n",
    "              transformed_train_data\n",
    "              | 'EncodeTrainData' >> beam.FlatMapTuple(lambda batch, _: transformed_data_coder.encode(batch))\n",
    "              | 'WriteTrainData' >> beam.io.WriteToTFRecord(\n",
    "                  os.path.join(working_dir, TRANSFORM_TRAIN_FILENAME)))\n",
    "\n",
    "          # Encode transformed test data and write to disk\n",
    "          _ = (\n",
    "              transformed_test_data\n",
    "              | 'EncodeTestData' >> beam.FlatMapTuple(lambda batch, _: transformed_data_coder.encode(batch))\n",
    "              | 'WriteTestData' >> beam.io.WriteToTFRecord(\n",
    "                  os.path.join(working_dir, TRANSFORM_TEST_FILENAME)))\n",
    "          \n",
    "          # Write transform function to disk\n",
    "          _ = (\n",
    "            transform_fn\n",
    "            | 'WriteTransformFn' >>\n",
    "            tft_beam.WriteTransformFn(os.path.join(working_dir)))\n",
    "\n",
    "    return transform_fn, transformed_train_data, transformed_test_data, transformed_metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "8GPbh3qKzyTb"
   },
   "outputs": [],
   "source": [
    "def main():\n",
    "  return transform_data(WORKING_DIR)\n",
    "\n",
    "if __name__ == '__main__':\n",
    "  transform_fn, transformed_train_data,transformed_test_data, transformed_metadata = main()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "-CqjN6qrTzQ5"
   },
   "source": [
    "##Prepare Training and Test Datasets from TFTransformOutput\n",
    "\n",
    "Now that you have the transformed examples, you now need to prepare the dataset windows for this time series data. As discussed in class, you want to group a series of measurements and that will be the feature for a particular label. In this particular case, it makes sense to group consecutive measurements and use that as the indicator for an activity. For example, if you take 80 measurements and it oscillates greatly (just like in the visualizations in the earlier parts of this notebook), then the model should be able to tell that it is a 'Running' activity. Let's implement that in the following cells using the [tf.data.Dataset.window()](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#window) method. Notice that there is an `add_mode()` function. If you remember how the original CSV looks like, you'll notice that each row has an activity label. Thus if we want to associate a single activity to a group of 80 measurements, then we just get the activity that is mentioned most in all those rows (e.g. if 75 elements of the window point to `Walking` activity and only 5 point to 'Sitting`, then the entire window is associated to `Walking`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "XQxYFAXkNUwP"
   },
   "outputs": [],
   "source": [
    "# Get the output of the Transform component\n",
    "tf_transform_output = tft.TFTransformOutput(os.path.join(WORKING_DIR))\n",
    "\n",
    "# Parameters\n",
    "HISTORY_SIZE = 80\n",
    "BATCH_SIZE = 100\n",
    "STEP_SIZE = 40\n",
    "\n",
    "def parse_function(example_proto):\n",
    "    '''Parse the values from tf examples'''\n",
    "    feature_spec = tf_transform_output.transformed_feature_spec()\n",
    "    features = tf.io.parse_single_example(example_proto, feature_spec)\n",
    "    values = list(features.values())\n",
    "    values = [float(value) for value in values]\n",
    "    features = tf.stack(values, axis=0)\n",
    "    return features\n",
    "\n",
    "def add_mode(features):\n",
    "    '''Calculate mode of activity for the current history size of elements'''\n",
    "    unique, _, count = tf.unique_with_counts(features[:,0])\n",
    "    max_occurrences = tf.reduce_max(count)\n",
    "    max_cond = tf.equal(count, max_occurrences)\n",
    "    max_numbers = tf.squeeze(tf.gather(unique, tf.where(max_cond)))\n",
    "\n",
    "    #Features (X) are all features except activity (x-acc, y-acc, z-acc)\n",
    "    #Target(Y) is the mode of activity values of all rows in this window\n",
    "    return (features[:,1:], max_numbers)\n",
    "\n",
    "def get_windowed_dataset(path):\n",
    "  '''Get the dataset and group them into windows'''\n",
    "  dataset = tf.data.TFRecordDataset(path)\n",
    "  dataset = dataset.map(parse_function)\n",
    "  dataset = dataset.window(HISTORY_SIZE, shift=STEP_SIZE, drop_remainder=True)\n",
    "  dataset = dataset.flat_map(lambda window: window.batch(HISTORY_SIZE))\n",
    "  dataset = dataset.map(add_mode)\n",
    "  dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)\n",
    "  dataset = dataset.repeat()\n",
    "\n",
    "  return dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "i0grFNuaTsk3"
   },
   "outputs": [],
   "source": [
    "# Get list of train and test data tfrecord filenames from the transform outputs\n",
    "train_tfrecord_files = tf.io.gfile.glob(os.path.join(WORKING_DIR, TRANSFORM_TRAIN_FILENAME + '*'))\n",
    "test_tfrecord_files = tf.io.gfile.glob(os.path.join(WORKING_DIR, TRANSFORM_TEST_FILENAME + '*'))\n",
    "\n",
    "# Generate dataset windows\n",
    "windowed_train_dataset = get_windowed_dataset(train_tfrecord_files[0])\n",
    "windowed_test_dataset = get_windowed_dataset(test_tfrecord_files[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "89KG2jANSWt4"
   },
   "outputs": [],
   "source": [
    "# Preview an example in the train dataset\n",
    "for x, y in windowed_train_dataset.take(1):\n",
    "  print(\"\\nFeatures (x-acc, y-acc, z-acc):\\n\")\n",
    "  print(x)\n",
    "  print(\"\\nTarget (activity):\\n\")\n",
    "  print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "uOXmSO4YTRWv"
   },
   "source": [
    "You should see a sample of a dataset window above. There are 80 consecutive measurements of `x-acc`, `y-acc`, and `z-acc` that correspond to a single labeled activity. Moreover, you also set it up to be in batches of 100 windows. This can now be fed to train an LSTM so it can learn how to detect activities based on 80-measurement windows. You can also preview a sample in the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "klu7t6QGlhSc"
   },
   "outputs": [],
   "source": [
    "# Preview an example in the train dataset\n",
    "for x, y in windowed_test_dataset.take(1):\n",
    "  print(\"\\nFeatures (x-acc, y-acc, z-acc):\\n\")\n",
    "  print(x)\n",
    "  print(\"\\nTarget (activity):\\n\")\n",
    "  print(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "baQ6Blk0Tire"
   },
   "source": [
    "## Wrap Up\n",
    "\n",
    "In this lab, you were able to prepare time-series data from an accelerometer to transformed features that are grouped into windows to make predictions. The same concept can be applied to any data where you need take a few seconds of measurements before the model makes a prediction. "
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "C2_W4_Lab_2_Signals.ipynb",
   "private_outputs": true,
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
